Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rt(threaded): basic self-tuning of injection queue #5720

Merged
merged 25 commits into from
Jun 1, 2023

Conversation

carllerche
Copy link
Member

Each multi-threaded runtime worker prioritizes pulling tasks off of its local queue. Every so often, it checks the injection (global) queue for work submitted there. Previously, "every so often," was a constant "number of tasks polled" value. Tokio sets a default of 61, but allows users to configure this value.

If workers are under load with tasks that are slow to poll, the injection queue can be starved. To prevent starvation in this case, this commit implements some basic self-tuning. The multi-threaded scheduler tracks the mean task poll time using an exponentially-weighted moving average. It then uses this value to pick an interval at which to check the injection queue.

This commit is a first pass at adding self-tuning to the scheduler. There are other values in the scheduler that could benefit from self-tuning (e.g. the maintenance interval). Additionally, the current-thread scheduler could also benfit from self-tuning. However, we have reached the point where we should start investigating ways to unify logic in both schedulers. Adding self-tuning to the current-thread scheduler will be punted until after this unification.

With this change, I can now add the benchmark mentioned in #5712 and this brings it down from 2s -> 100ms about.

@carllerche carllerche added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime T-performance Topic: performance and benchmarks labels May 24, 2023
@github-actions github-actions bot added the R-loom Run loom tests on this PR label May 24, 2023
Each multi-threaded runtime worker prioritizes pulling tasks off of its
local queue. Every so often, it checks the injection (global) queue for
work submitted there. Previously, "every so often," was a constant
"number of tasks polled" value. Tokio sets a default of 61, but allows
users to configure this value.

If workers are under load with tasks that are slow to poll, the
injection queue can be starved. To prevent starvation in this case, this
commit implements some basic self-tuning. The multi-threaded scheduler
tracks the mean task poll time using an exponentially-weighted moving
average. It then uses this value to pick an interval at which to check
the injection queue.

This commit is a first pass at adding self-tuning to the scheduler.
There are other values in the scheduler that could benefit from
self-tuning (e.g. the maintenance interval). Additionally, the
current-thread scheduler could also benfit from self-tuning. However, we
have reached the point where we should start investigating ways to unify
logic in both schedulers. Adding self-tuning to the current-thread
scheduler will be punted until after this unification.
@carllerche carllerche force-pushed the rt-inject-interval-tuning2 branch from 9b49b59 to df96c16 Compare May 24, 2023 23:27
// was called, turning the poll into a "blocking op". In this
// case, we don't want to measure the poll time as it doesn't
// really count as an async poll anymore.
core.metrics.end_poll();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes might be a bit controversial, but when we poll the LIFO slot, we aren't "really" polling a new task, but we are batching polling the lifo task under the initially polled task.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this feels reasonable. can we add a comment explaining that rationale?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@carllerche
Copy link
Member Author

I will work on fixing CI.

I will also investigate tests, but I think that might be hard because they are timing based.

@hawkw hawkw self-requested a review May 25, 2023 19:28
@Noah-Kennedy
Copy link
Contributor

We should probably test this change on a few different workloads to see how it performs.

@carllerche
Copy link
Member Author

@Noah-Kennedy Go for it. Tokio's benchmarks (except the one this targets) remain unchanged (margin of error).

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implementation looks really solid! i had a few minor nitpicks but nothing blocking.

benches/rt_multi_threaded.rs Show resolved Hide resolved
tokio/src/runtime/builder.rs Show resolved Hide resolved
// was called, turning the poll into a "blocking op". In this
// case, we don't want to measure the poll time as it doesn't
// really count as an async poll anymore.
core.metrics.end_poll();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this feels reasonable. can we add a comment explaining that rationale?

tokio/src/runtime/scheduler/multi_thread/stats.rs Outdated Show resolved Hide resolved
tokio/src/runtime/scheduler/multi_thread/worker.rs Outdated Show resolved Hide resolved
tokio/tests/rt_threaded.rs Show resolved Hide resolved
@carllerche carllerche merged commit 79a7e78 into master Jun 1, 2023
@carllerche carllerche deleted the rt-inject-interval-tuning2 branch June 1, 2023 15:13
carllerche added a commit that referenced this pull request Jun 1, 2023
PR #5720 introduced runtime self-tuning. It included a test that
attempts to verify self-tuning logic. The test is heavily reliant on
timing details. This patch attempts to make the test a bit more reliable
by not assuming tuning will converge within a set amount of time.
carllerche added a commit that referenced this pull request Jun 2, 2023
PR #5720 introduced runtime self-tuning. It included a test that
attempts to verify self-tuning logic. The test is heavily reliant on
timing details. This patch attempts to make the test a bit more reliable
by not assuming tuning will converge within a set amount of time.
crapStone pushed a commit to Calciumdibromid/CaBr2 that referenced this pull request Jul 6, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dependencies | minor | `1.28.2` -> `1.29.1` |
| [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dev-dependencies | minor | `1.28.2` -> `1.29.1` |

---

### Release Notes

<details>
<summary>tokio-rs/tokio (tokio)</summary>

### [`v1.29.1`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.1): Tokio v1.29.1

[Compare Source](tokio-rs/tokio@tokio-1.29.0...tokio-1.29.1)

##### Fixed

-   rt: fix nesting two `block_in_place` with a `block_on` between (#&#8203;5837])

#&#8203;5837]: tokio-rs/tokio#5837

### [`v1.29.0`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.0): Tokio v1.29.0

[Compare Source](tokio-rs/tokio@tokio-1.28.2...tokio-1.29.0)

Technically a breaking change, the `Send` implementation is removed from
`runtime::EnterGuard`. This change fixes a bug and should not impact most users.

##### Breaking

-   rt: `EnterGuard` should not be `Send` (#&#8203;5766])

##### Fixed

-   fs: reduce blocking ops in `fs::read_dir` (#&#8203;5653])
-   rt: fix possible starvation (#&#8203;5686], #&#8203;5712])
-   rt: fix stacked borrows issue in `JoinSet` (#&#8203;5693])
-   rt: panic if `EnterGuard` dropped incorrect order (#&#8203;5772])
-   time: do not overflow to signal value (#&#8203;5710])
-   fs: wait for in-flight ops before cloning `File` (#&#8203;5803])

##### Changed

-   rt: reduce time to poll tasks scheduled from outside the runtime (#&#8203;5705], #&#8203;5720])

##### Added

-   net: add uds doc alias for unix sockets (#&#8203;5659])
-   rt: add metric for number of tasks (#&#8203;5628])
-   sync: implement more traits for channel errors (#&#8203;5666])
-   net: add nodelay methods on TcpSocket (#&#8203;5672])
-   sync: add `broadcast::Receiver::blocking_recv` (#&#8203;5690])
-   process: add `raw_arg` method to `Command` (#&#8203;5704])
-   io: support PRIORITY epoll events (#&#8203;5566])
-   task: add `JoinSet::poll_join_next` (#&#8203;5721])
-   net: add support for Redox OS (#&#8203;5790])

##### Unstable

-   rt: add the ability to dump task backtraces (#&#8203;5608], #&#8203;5676], #&#8203;5708], #&#8203;5717])
-   rt: instrument task poll times with a histogram (#&#8203;5685])

#&#8203;5766]: tokio-rs/tokio#5766

#&#8203;5653]: tokio-rs/tokio#5653

#&#8203;5686]: tokio-rs/tokio#5686

#&#8203;5712]: tokio-rs/tokio#5712

#&#8203;5693]: tokio-rs/tokio#5693

#&#8203;5772]: tokio-rs/tokio#5772

#&#8203;5710]: tokio-rs/tokio#5710

#&#8203;5803]: tokio-rs/tokio#5803

#&#8203;5705]: tokio-rs/tokio#5705

#&#8203;5720]: tokio-rs/tokio#5720

#&#8203;5659]: tokio-rs/tokio#5659

#&#8203;5628]: tokio-rs/tokio#5628

#&#8203;5666]: tokio-rs/tokio#5666

#&#8203;5672]: tokio-rs/tokio#5672

#&#8203;5690]: tokio-rs/tokio#5690

#&#8203;5704]: tokio-rs/tokio#5704

#&#8203;5566]: tokio-rs/tokio#5566

#&#8203;5721]: tokio-rs/tokio#5721

#&#8203;5790]: tokio-rs/tokio#5790

#&#8203;5608]: tokio-rs/tokio#5608

#&#8203;5676]: tokio-rs/tokio#5676

#&#8203;5708]: tokio-rs/tokio#5708

#&#8203;5717]: tokio-rs/tokio#5717

#&#8203;5685]: tokio-rs/tokio#5685

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi4wLjAiLCJ1cGRhdGVkSW5WZXIiOiIzNi4wLjAiLCJ0YXJnZXRCcmFuY2giOiJkZXZlbG9wIn0=-->

Co-authored-by: cabr2-bot <cabr2.help@gmail.com>
Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1958
Reviewed-by: crapStone <crapstone01@gmail.com>
Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime R-loom Run loom tests on this PR T-performance Topic: performance and benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants